Anaphors in Sanskrit

نویسندگان

  • Girish Nath Jha
  • Surjit Kumar Singh
  • Pravin Pralayankar
چکیده

Research in building robulst NLP systems with ambiguity resolution techniques has gained momentum in recent years. In particular, the anaphora resolution initiatives have reached unpreceented heights in last 10 years or so. Mitkov et. al (2001) have reported both rule based knowledge based approaches and machine learning based ‘knowledge poor’ approaches in an ACL issue devoted to this subject. Mitkov (2001 a) has also presented outstanding issues and challenges in this area. Johansson ed. (2007) reports the latest developments in this area of research and development. Indian languages in general, and Sanskrit in particular have not been profusely worked upon from these perspectives. Barring a notable exception (Sobha 2007), Sanskrit anaphors have been rarely looked upon from a computational perspective. The case of Sanskrit has been more severe due to two reasons a virtual absence of annotated corpora made it impossible for corpus based machine learning approaches and a poor understanding of Pā ini’s grammar from computational perspective has made it difficult to apply rule based approaches. While some work on Indic languages like those of Hock (1991), Davison (2006) have looked at diverse syntactic issues often not excluding anaphora, Shapiro (2003) has focused on lexical anaphors and pronouns in the languages of the subcontinent. Sobha et al (1998, 1999-a&b, 2007), Murthy et al (2005) have looked into the anaphora cases for some Indian languages in great detail and in particular for Sanskrit in their most recent paper (2007) as mentioned above. The authors in this paper are looking at the problem in a broader perspective. Since no effort has been made at comprehensive documentation and classification of Sanskrit anaphora, this is the primary focus of the present study. Similar to Soon, Ng and Lim (2001), the anaphora resolution presented here is proposed to be a part of the larger NLP system called Sanskrit Analysis System parts of which have been developed by the principal author and his research students at the Sanskrit Center of Jawaharlal Nehru University, New Delhi (Jha et al 04,05,06,07,08). The paper has the following major sections – • Sanskrit and its linguistic tradition: This section gives a brief background of Sanskrit language and its linguistic tradition for the benefit of general readers • Anaphors in Sanskrit: This section is an attempt at discussing and classifying anaphora and anaphora-like cases occurring in a wide variety of Sanskrit prose texts – those from the authors like Subandhu 11 Johansson, C. (Ed.) Proceedings of the Second Workshop on Anaphora Resolution (2008) (Vāsavadatta), Bā abha a(Kādambarī, Har acarita), Dan ī (Daśakumāracarita), Ambikā Dutt Vyāsa (Śivarājavijaya) and popular didactic prose texts like Pa–catantra and Hitopadeśa. Though the focus of this paper is the classical Sanskrit prose, examples from some popular poetic texts like Bhagvadgītā and Rāmāya a have also been studied. The authors have looked at the above textual data to arrive at a classification of lexical, sentential and discourse anaphora in Sanskrit. • Anaphora handling in Sanskrit intellectual tradition: this section presents three major śāstrāic (scientific) traditions in India which have insights on this subject – the vyākara a (grammar), the navya-nyāya (logic) and mīmā sā (interpretation) schools. An attempt has been made to understand what solutions have been provided to handle such ambiguity at various levels of natural language. • Computational Framework: In this section, available computational models for anaphora resolution for Indic languages have been examined and a new model has been attempted after incorporating inputs from the traditions of vyākara a (grammar), mīmā sā (interpretation), and nyāya (logic). The anaphora resolution presented here is a subset of a larger web based Sanskrit Analysis System implemented in Java (http://sanskrit,jnu.ac.in) and comprising of the following components – • Sandhi analysis (http://sanskrit.jnu.ac.in/sandhi/vicch eda.jsp ) • Morph analysis for nouns (http://sanskrit.jnu.ac.in/subanta/rsub anta.jsp ) • Morph analysis for verbs (http://sanskrit.jnu.ac.in/tanalyzer/tan alyze.jsp ) • POS tagger (http://sanskrit.jnu.ac.in/post/post.jsp ) • Gender recognition system (http://sanskrit.jnu.ac.in/grass/analyz e.jsp ) • Semantic class identification based on amarakosha (http://sanskrit.jnu.ac.in/amara/index. html) • Karaka analysis (http://sanskrit.jnu.ac.in/karaka/analy zer.jsp and depends on the information provided by the above components like – morphosyntactic labels, morphological information, case and gender information etc. 1. Sanskrit and its linguistic tradition Sanskrit is the oldest documented language of the Indo-European family. gveda (2000 BCE) is the oldest text of this family contains a sophisticated use of the prePā inian variety also called vaidikī. Pā ini variously calls his mother tongue as bhā ā or laukikī . His grammar has two sets of rules – for vaidikī (variety used in the vedas) and for laukikī (variety used by the common people). The term ‘Sanskrit’ (meaning ‘refined’) is given to the standard form of laukikī which emerged after Pā ini’s grammar (700 BCE). 12 ISSN 1736-6305 Vol. 2 http://dspace.utlib.ee/dspace /handle/10062/7129 Many indologists have studied the evolution of Sanskrit in three phases – 1.1 Old Sanskrit (Vedic) Vedic (vaidikī) is the oldest extant form of Sanskrit or Old Indo Aryan (OIA). It roughly dates back to 2000 BCE and includes the four Vedas ( g-veda, sāmaveda, yajur-veda, atharva-veda), their pā has (structured readings) and samhitā traditions – for example taittirīya and maitrāyi i samhitās of k a yajurveda, and vājasaneyī samhitā of śukla-yajurveda. Based on linguistic similarities, some scholars have also included the later preclassical phase of Sanskrit including brāhma as, āra yakas, upani adas and kalpasūtras as vedic 1.2 Middle Sanskrit (pre-classical) The middle Sanskrit consists of brāhma as, āra yakas, upani adas belonging to the sa hitas, kalpasūtras (kalpa vedā ga), and the six vedā gas or scientific disciplines required to be studied for understanding Vedas. Four out of these dealt with linguistics • śik ā (pronunciation) • vyākara a (grammar) • nirukta (etymology) • chanda (meter) • jyotiśa (astronomy) • kalpa (ceremonial) Besides these, many indices and lists were prepared for explaning the vedic verses. These were called pariśi a (appendices explaining sūtras) and anukrama ī (lists containing order of verses and information on organization of vedic texts) 1.3 Classical Sanskrit The classical Sanskrit includes the epics (mahabhārata and rāmāya a), 18 purā as, and and a huge body of sāhitya (literature) many of which laid the foundation of indological studies in the west. 1.4 Purpose of Linguistic studies in India The ancient Indian scholars were preoccupied with linguistic studies for two basic reasons to maintain texts of oral tradition, and to defend the Vedic knowledge. As mentioned above, of the six ancillary disciplines required to understand Vedas, four were for linguistic study. Kapoor (1993) has divided the Indian linguistic tradition in four phases Phase I: earliest times up to Panini (2000 – 700 BCE) Speculations in śruti texts, four of the six vedangas (vyākara a, chanda, nirukta, śik ā), work of Yāska, k prātiśākhya, ācāryas mentioned by Pā ini. Phase II: Pā ini to Ānandavardhana (700 BCE 9th CE) a ādhyāyī of Pā ini, vārttika of Kātyayana, mahābhā ya of Pata–jali, mīmā sāsūtra of Jaimini, vākyapadīya of Bhart hari, works on poetics from Bharata up to Ānandavardhana. Phase III: Rāmacandra to Nāgeśa Bha a (11th CE to 18th CE) This phase ncludes pedagogical grammars based on Pā ini's grammar, investigations into principles of grammar and also attempts 13 Johansson, C. (Ed.) Proceedings of the Second Workshop on Anaphora Resolution (2008) to apply Pā inian model to describe other languages. Phase IV: Franz Kielhorn onwards This phase includes modern textual interpretations of language, works of Kielhorn, Bhandarkar, Carudev Shastri, Katre, Dandekar, among many others. 2 Anaphors in Sanskrit Sanskrit anaphors can be classified in two broad categories – anaphor proper and anaphor-like cases. Sobha et al (2007) have taken the first category and considered only the pronominals like tat (pronominal), yat (corr pronoun), sva, ātman (reflexive), parasparam, anyonyam (reciprocal) and their inflected forms as anaphors. Pā ini in his sūtra sarvādīni sarvanāmāni lists the following pronouns (sarvanāma) – sarva, viśva, ubha, ubhaya, atara, atama, anya, anyatara, itara, tvat, tva, nema, sama, sima; pūrva, apara, avara, dak i a, uttara, apara, adhara (when not used as noun); sva (if not used for family or wealth); antara; tyad, tad, yad, etad, idam, adas, eka, dvi, yu mad, asmad, bhavatu, kim (35) These can be categorized as follows – • sarvādi (14) including two sets of atara and atama which are suffixes to form comparatives and superlatives • pūrva, apara, avara, dak i a, uttara, apara, adhara (when not used as noun) • sva (if not used for family or wealth), example hari svān vedārtha avedayatin this case it is used as a noun in the sense of family or wealth, ātman (though not a pronoun, can come as reflexive) anaphor • antara (noun if used in the sense of under garments) • tyad, tad, yad, etad, idam, adas, eka, dvi, yu mad, asmad, bhavatu, kim Chandrashekar (2007) classifies pronouns in Sanskrit as follows – SN Sarva Nāman (Pronoun Other, with gender, number, and declensional sub-tags) (e.g., anya , aparā) SNU Sarva Nāman Uttama (Pronoun First Person, number, and declensional sub-tags) (e.g., asmad) SNM Sarva Nāman Madhyama (Pronoun Second Person, number, and declensional subtags) (e.g., tvad) SNA Sarva Nāman Ātman (Pronoun Reflexive, with or without gender, number, and declensional sub-tags) (e.g., nija , svasya) SNN Sarva Nāman Nirdeśātmaka (Pronoun Demonstrative, with gender, number, and declensional sub-tags) (e.g., idam, sa ) SNP Sarva Nāman Prāśnārthika (Pronoun Interrogative, with gender, number, and declensional sub-tags) (e.g., kim, kad) SNS Sarva Nāman Sāmbandhika (Pronoun Relative, with gender, number, and declensional subtags) (e.g., ya , yā) 14 ISSN 1736-6305 Vol. 2 http://dspace.utlib.ee/dspace /handle/10062/7129 2.1 Classification of anaphors in Sanskrit Here we try to deal with anaphora and similar elements in Sanskrit language usage. To see it in broader perspective, the cases of this are firstly divided into ‘anaphora proper’ and ‘anaphora like’ cases. ‘Anaphora proper’ can be divided into four categories – pronominal, reflexive, reciprocal and correlative. a) pronominal anaphors where there is one or more pronouns and they refer to some other noun or pronoun. 1. [rāma [i] g ham āgata ]MC [sa [i] Rama[i] to home came he[i] ca CONJ mayā d a ]MC and by me was seen 2. [rāma [i] g ham āgata ]MC [aham Rama[i] to home came I [ca] CONJ tam[i] apaśyam]MC and him[i] saw In the first part of the next sentence (3a), an NP antecedent precedes the pronominal anaphor in the same clause. The latter part (3b) is an example of correlative anaphor and will be discussed later 3. a. sattva-anurūpā[i] sarvasya[i] self-nature[adj.f.nom.sg]i all[pro.m.gen.sg]i śraddhā bhavati faith[f.nom.sg] be[pres.III.sg] bhārata arjuna[m.nom.sg] 3.b. śraddhā-maya ayam faith-ful[adj.m.nom.sg] this[pro.m.nom.sg] puru a ya person[m.nom.sg]j who[pro.m.nom.sg]j yatwhich[pro.m.nom.sg]k śraddha faithful[adj.m.nom.sg] sa eva he[pro.m.nom.sg]j same[indecl] sa he[pro.m.nom.sg]k Both 3.a and 3.b. are from (Gītā-17.3) and read as Arjuna! each person’s[i] faith is according to his/her own[i] nature. A person[j] having faith is identical with the person/thing he/she[j] has faith in 4) ātma-aupamyena self[pro]-comparison[n.ins.sg]i sarvatra samam everywhere[indecl] equally[indecl] paśyati ya see[pres.III.sg]who[pro.m.nom.sg]i arjuna sukham arjuna[m.voc.sg] happiness[n.acc.sg] vā yadi vā or[indecl] if[indecl] or[indecl] du kham sa [i] grief[n.acc.sg] he[pro.m.nom.sg] 15 Johansson, C. (Ed.) Proceedings of the Second Workshop on Anaphora Resolution (2008) yogī parama yogī[m.nom.sg] absolute[adj.m.nom.sg] mata regarded[adj.m.nom.sg] Arjuna! One who[i] sees happiness and sorrow equally as compared with oneself[i], he [i] is regarded as absolute yogī.(Gītā6.32) b) Reflexive anaphors where there are reflexive pronouns or words such as sva, svayam, svayameva, ātma, ātmanā and other forms of these and they refer to some noun or pronoun in its left or right context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Event anaphors – a unified account ?

This paper is concerned with different types of event anaphors in English and to some extent also in Swedish. The point of departure is the seminal work of Hankamer and Sag (1976) where two major classes of anaphors are suggested, namely surface anaphors and deep anaphors. An important distinction is that surface anaphors, such as the VP-ellipsis, demand a linguistically realized antecedent, wh...

متن کامل

Conditional Random Fields based Pronominal Resolution in Tamil

This paper deals with Tamil pronominal resolution using Conditional Random Fields a machine learning approach. A detailed linguistic analysis of Tamil pronominals and its antecedence occurring in various syntactic constructs is done, which led to the selection of appropriate features for CRF approach. The syntactic features thus identified made the system learn most frequently occurring pronoun...

متن کامل

Noun-phrase anaphors and focus: the informational load hypothesis.

The processing of noun-phrase (NP) anaphors in discourse is argued to reflect constraints on the activation and processing of semantic information in working memory. The proposed theory views NP anaphor processing as an optimization process that is based on the principle that processing cost, defined in terms of activating semantic information, should serve some discourse function--identifying ...

متن کامل

Management of cohesion in the written productions of monolingual Persian-speaking students with specific language disorder

Introduction: Students with specific language impairment (SLI) have many difficulties in producing coherent written texts The goal of this study was to investigate and compare the management of cohesion in the written production of individuals with SLI and their normal peers in terms of density and diversity of connectives, the density of punctuation marks (periods and commas) and density and d...

متن کامل

Resolving Discourse Deictic Anaphors in Tutorial Dialogues

Most of the anaphoric resolution algorithms developed so far focus on anaphors with NP antecedents, be it inter-sentential or intrasentential. The main focus of this paper is to resolve various other types of anaphors such as discourse deictic anaphors found in computermediated tutorial dialogues on physics. We do this first through a corpus-based study of physics tutoring dialogues. Our approa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008